MapReduce for Experimental Search
نویسندگان
چکیده
This draft report presents preliminary results for the TREC 2010 adhoc web search task. We ran our MIREX system on 0.5 billion web documents from the ClueWeb09 crawl. On average, the system retrieves at least 3 relevant documents on the first result page containing 10 results, using a simple index consisting of anchor texts, page titles, and spam removal.
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملUniversity of Twente at TREC 2010: MapReduce for Experimental Search
This draft report presents preliminary results for the TREC 2010 adhoc web search task. We ran our MIREX system on 0.5 billion web documents from the ClueWeb09 crawl. On average, the system retrieves at least 3 relevant documents on the first result page containing 10 results, using a simple index consisting of anchor texts, page titles, and spam removal.
متن کاملA parallel tag affinity computation for social tagging systems using MapReduce
Tag affinity is the relationship between tags. It is a useful information for search and recommendation in social tagging systems. Tag affinity is measured by several types of tag cooccurrence frequency. The computation of tag affinity is a time-consuming task as the tagging information is accumulated. To alleviate this problem, we propose a parallel tag affinity computation method using MapRed...
متن کاملQuery-driven Frequent Co-occurring Term Extraction over Relational Data using MapReduce
In this paper we study how to efficiently compute frequent cooccurring terms (FCT) in the results of a keyword query in parallel using the popular MapReduce framework. Taking as input a keyword query q and an integer k, an FCT query reports the k terms that are not in q, but appear most frequently in the results of the keyword query q over multiple joined relations. The returned terms of FCT se...
متن کاملMRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs
MapReduce based data-intensive computing solutions are increasingly deployed as production systems. Unlike Internet companies who invent and adopt the technology from the very beginning, traditional enterprises demand easy-to-use software due to the limited capabilities of administrators. Automatic job optimization software for MapReduce is a promising technique to satisfy such requirements. In...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010